# CS250P: Computer Systems Architecture The Hardware/Software Interface

Sang-Woo Jun Fall 2023



### Course outline

- ☐ Part 1: The Hardware-Software Interface
  - O What makes a 'good' processor?
  - Assembly programming and conventions
- Part 2: Recap of digital design
  - Combinational and sequential circuits
  - How their restrictions influence processor design
- ☐ Part 3: Computer Architecture
  - Simple and pipelined processors
  - Out-of-order and explicitly parallel architectures
  - Caches and the memory hierarchy
- ☐ Part 4: Computer Systems
  - Operating systems, Virtual memory

### A simple hardware abstraction

#### Is this good architecture?

no... CPU always idle waiting for memory

```
1 CPU instruction: ~0.3 ns (@ 3 GHz)
1 DRAM access: 10 - 100 ns
```

### A modern server system hardware



### A modern server system hardware



### Eight great ideas

- ☐ Design for Moore's Law
- Use abstraction to simplify design
- ☐ Make the common case fast
- ☐ Performance via parallelism
- Performance via pipelining
- ☐ Performance via prediction
- ☐ Hierarchy of memories
- Dependability via redundancy





today













## Great idea: Use abstraction to simplify design

- ☐ Abstraction helps us deal with complexity by hiding lower-level detail
  - One of the most fundamental tools in computer science!
  - Examples:
    - Application Programming Interface (API),
    - System calls,
    - Application Binary Interface (ABI),
    - Instruction-Set Architecture

### Below your program

- Application software
  - Written in high-level language (typically)
- System software
  - Compiler: translates HLL code to machine code
  - Operating System: service code
    - Handling input/output
    - Managing memory and storage
    - Scheduling tasks & sharing resources
- ☐ Hardware
  - Processor, memory, I/O controllers



### The Instruction Set Architecture

- ☐ An Instruction-Set Architecture (ISA) is the abstraction between the software and processor hardware
  - The 'Hardware/Software Interface'
  - Different from 'Microarchitecture', which is how the ISA is implemented
- ☐ A consistent ISA allows software to run on different machines of the same architecture
  - o e.g., x86 across Intel, AMD, and various speed and power ratings

### Levels of program code

- ☐ High-level language
  - Level of abstraction closer to problem domain
  - Provides for productivity and portability
- ☐ Assembly language
  - Textual representation of instructions
- ☐ Hardware representation
  - Binary digits (bits)
  - Encoded instructions and data

Instruction Set Architecture (ISA) is the agreement on what this will do



High-level language program (in C)

Assembly language program (for RISC-V)

```
swap(int v[], int k)
{int temp;
   temp = v[k];
   v[k] = v[k+1];
   v[k+1] = temp;
}
```





Binary machine language program (for RISC-V) 

### A RISC-V Example ("00A9 8933")

- ☐ This four-byte binary value will instruct a RISC-V CPU to perform
  - add values in registers x19 x10, and store it in x18
  - o regardless of processor speed, internal implementation, or chip designer
    - Various "microarchitectures" adhere to same ISA, with cost/performance/etc tradeoffs



### Some history of ISA

- ☐ Early mainframes did not have a concept of ISAs (early 1960s)
  - Each new system had different hardware-software interfaces
  - Software for each machine needed to be re-built
- IBM System/360 (1964) introduced the concept of ISAs
  - Same ISA shared across five different processor designs (various cost!)
  - Same OS, software can be run on all
  - o Extremely successful!
- ☐ Aside: Intel x86 architecture introduced in 1978
  - O Strict backwards compatibility maintained even now (The A20 line... (E))
  - Attempted clean-slate redesign multiple times but failed (iAPX 432, EPIC, ...)

## IBM System/360 Model 20 CPU



Source: Ben Franske, Wikipedia

# CS250P: Computer Systems Architecture What Makes a "Good" ISA?

Sang-Woo Jun Fall 2023



### What makes a 'good' ISA?

- ☐ Computer architecture is a complicated art...
  - No one design method leads to a 'best' computer
  - Subject to workloads, use patterns, criterion, operation environment, ...
- oxdot Important criteria: Given the same restrictions,
  - O High performance!
  - Power efficiency
  - Low cost
  - O ...
- ☐ May depend on target applications
  - E.g., Apple knows (and cares) more about its software than Intel

## What does it mean to be high-performance?

- ☐ In the 90s, CPUs used to compete with clock speed
  - "My 166 MHz processor was faster than your 100 MHz processor!"
  - Not very representative between different architectures
  - 2 GHz processor may require 5 instructions to do what 1 GHz one needs only 2
    - (Or not!)
- ☐ Sometimes ISA designers make trade-offs
  - E.g., Capability of each instruction vs. Circuit simplicity (=> Faster clock)
  - O Which choice is better?

## What does it mean to be high-performance?

- ☐ Let's define performance = 1/execution time
- Example: time taken to run a program
  - 10s on A, 15s on B
  - Execution TimeB / Execution TimeA= 15s / 10s = 1.5
  - So A is 1.5 times faster than B

 $Performance_{\mathsf{X}}/Performance_{\mathsf{Y}}$ 

= Execution time $_{Y}$  /Execution time $_{X} = n$ 

### Measuring execution time

- ☐ Elapsed time
  - Total response time, including all aspects
    - Processing, I/O, OS overhead, idle time
  - Determines system performance

- CPU time (Focus here for now)
  - Time spent by the processor on a given job
    - Ignores I/O time, other peoples' jobs' shares
  - Consists of user CPU time and system (OS) CPU time
  - Different programs are affected differently by CPU and system performance

## CPU clocking

☐ Operation of digital hardware governed by a constant-rate clock



- ☐ Clock period: duration of a clock cycle
  - $\circ$  e.g., 250ps = 0.25ns = 250×10<sup>-12</sup>s
- ☐ Clock frequency (rate): cycles per second
  - $\circ$  e.g., 4.0GHz = 4000MHz = 4.0×10<sup>9</sup>Hz

### CPU time

- ☐ CPU time = Clock cycle \* clock cycle time
- Performance improved by
  - Reducing number of clock cycles
  - Increasing clock rate
- ☐ Hardware designer must often trade off clock rate against cycle count

### Instruction count and CPI

- ☐ Instruction Count for a program
  - Determined by program, ISA and compiler
- ☐ Average cycles per instruction
  - Determined by CPU hardware
  - If different instructions have different CPI
    - Average CPI affected by instruction mix

```
Clock Cycles = Instruction Count \times Cycles per Instruction

CPU Time = Instruction Count \times CPI \times Clock Cycle Time

= \frac{Instruction Count \times CPI}{Clock Rate}
```

### CPI example

- $\square$  Computer A: Cycle Time = 250ps, CPI = 2.0
- ☐ Computer B: Cycle Time = 500ps, CPI = 1.2
- Same ISA

### CPI in more detail

☐ If different instruction classes take different numbers of cycles

Clock Cycles = 
$$\sum_{i=1}^{n} (CPI_i \times Instructio n Count_i)$$

☐ Weighted average CPI

\*Not always true with michroarchitectural tricks (Pipelining, superscalar, ...)

$$CPI = \frac{Clock \ Cycles}{Instructio \ n \ Count} = \sum_{i=1}^{n} \left( CPI_{i} \times \frac{Instructio \ n \ Count}{Instructio \ n \ Count} \right)$$

$$Pynamic \ profiling!$$
Relative frequency

### Performance summary

- ☐ Performance depends on
  - Algorithm: affects Instruction count, (possibly CPI)
  - Programming language: affects Instruction count, (possibly CPI)
  - Compiler: affects Instruction count, CPI
  - Instruction set architecture: affects Instruction count, CPI, Clock speed

$$CPU \, Time = \frac{Instructions}{Program} \times \frac{Clock \, cycles}{Instruction} \times \frac{Seconds}{Clock \, cycle}$$

### Some goals for a good ISA

| Low instruction count                | Low CPI High clock speed           |
|--------------------------------------|------------------------------------|
| Each instruction should do more work | Each instruction should be simpler |

How do we reconcile?

### Real-world examples: Intel i7 and ARM Cortex-A53





CPI of ARM Cortex-A53 on SPEC2006 Benchmarks

(High CPI is not ARM-inherent. Newer A78 has ideal 6 CPI)